2,181 research outputs found
Generalized Yule-Walker Estimation for Spatio-Temporal Models with Unknown Diagonal Coefficients
We consider a class of spatio-temporal models which extend popular
econometric spatial autoregressive panel data models by allowing the scalar
coefficients for each location (or panel) different from each other. To
overcome the innate endogeneity, we propose a generalized Yule-Walker
estimation method which applies the least squares estimation to a Yule-Walker
equation. The asymptotic theory is developed under the setting that both the
sample size and the number of locations (or panels) tend to infinity under a
general setting for stationary and alpha-mixing processes, which includes
spatial autoregressive panel data models driven by i.i.d. innovations as
special cases. The proposed methods are illustrated using both simulated and
real data
Thresh: A Unified, Customizable and Deployable Platform for Fine-Grained Text Evaluation
Fine-grained, span-level human evaluation has emerged as a reliable and
robust method for evaluating text generation tasks such as summarization,
simplification, machine translation and news generation, and the derived
annotations have been useful for training automatic metrics and improving
language models. However, existing annotation tools implemented for these
evaluation frameworks lack the adaptability to be extended to different domains
or languages, or modify annotation settings according to user needs. And the
absence of a unified annotated data format inhibits the research in multi-task
learning. In this paper, we introduce Thresh, a unified, customizable and
deployable platform for fine-grained evaluation. By simply creating a YAML
configuration file, users can build and test an annotation interface for any
framework within minutes -- all in one web browser window. To facilitate
collaboration and sharing, Thresh provides a community hub that hosts a
collection of fine-grained frameworks and corresponding annotations made and
collected by the community, covering a wide range of NLP tasks. For deployment,
Thresh offers multiple options for any scale of annotation projects from small
manual inspections to large crowdsourcing ones. Additionally, we introduce a
Python library to streamline the entire process from typology design and
deployment to annotation processing. Thresh is publicly accessible at
https://thresh.tools
Improving Large-scale Paraphrase Acquisition and Generation
This paper addresses the quality issues in existing Twitter-based paraphrase
datasets, and discusses the necessity of using two separate definitions of
paraphrase for identification and generation tasks. We present a new
Multi-Topic Paraphrase in Twitter (MultiPIT) corpus that consists of a total of
130k sentence pairs with crowdsoursing (MultiPIT_crowd) and expert
(MultiPIT_expert) annotations using two different paraphrase definitions for
paraphrase identification, in addition to a multi-reference test set
(MultiPIT_NMR) and a large automatically constructed training set
(MultiPIT_Auto) for paraphrase generation. With improved data annotation
quality and task-specific paraphrase definition, the best pre-trained language
model fine-tuned on our dataset achieves the state-of-the-art performance of
84.2 F1 for automatic paraphrase identification. Furthermore, our empirical
results also demonstrate that the paraphrase generation models trained on
MultiPIT_Auto generate more diverse and high-quality paraphrases compared to
their counterparts fine-tuned on other corpora such as Quora, MSCOCO, and
ParaNMT.Comment: The project webpage is at http://twitter-paraphrase.com/ Accepted at
EMNLP 202
Mechanical properties improvement of ground Tire Rubber/Thermoplastic composites produced by rotational molding
Dans ce travail, des composites à base de caoutchouc de pneus moulus (GTR)/résines thermoplastiques ont été produits avec succès en combinant une technique de mélange à sec avec un procédé de rotomoulage. Afin d'améliorer les propriétés mécaniques des composites résultants, certaines méthodes de modification ont été utilisées. À partir des composites rotomoulés, un ensemble complet de caractérisation comprenant les propriétés morphologiques, physiques (masse volumique et dureté) et mécaniques (traction, flexion et impact) a été réalisé. La première partie du travail a étudié l'effet de l'incorporation d'agents gonflants chimiques, de fibres de bois d'érable et de deux traitements de surface du GTR (modifié par le polyéthylène maléaté (MAPE) en solution et traité par irradiation micro-ondes) sur les propriétés mécaniques des composites GTR/polyéthylène linéaire de basse densité (LLDPE) produits par rotomoulage. La deuxième partie du travail a étudié l'effet du GTR traité au MAPE sur les propriétés mécaniques des composites GTR/polypropylène (PP) préparés par rotomoulage. Les propriétés mécaniques ont indiqué que le GTR traité par MAPE, parmi ces méthodes de modification, était une approche efficace pour améliorer la compatibilité et l'adhésion interfaciale des composites GTR/thermoplastiques. Par exemple, la résistance à l’impact du LLDPE/GTR (85/15) a montré une amélioration de 30% avec l’addition de 0.3% en poids de MAPE comparé au composite avec la même concentration de GTR sans traitement au MAPE. Aussi, e une augmentation de 52% de la résistance en impact pour le composite PP/GTR (50/50) a été obtenu avec l’introduction de 2% en poids de MAPE en comparaison avec une teneur similaire de GTR sans traitement au MAPE.In this work, ground tire rubber (GTR)/thermoplastic composites were successfully produced by combining a dry-blending technique with a rotational molding process. In order to improve the mechanical properties of the resulting composites, different modification methods were used. From the rotomolded composites produced, a complete set of characterization including morphological, physical (density and hardness) and mechanical properties (tensile, flexural and impact) was performed. The first part of the work investigated the effect of chemical blowing agent and maple wood fibers concentration, as well as two GTR surface treatments (maleated polyethylene (MAPE) in solution and microwave irradiation) on the mechanical properties of GTR/linear low density polyethene (LLDPE) composites. The second part of the work studied the effect of MAPE treated GTR on the mechanical properties of GTR/polypropylene (PP) composites. Overall, the results showed that MAPE treated GTR was an effective approach for improving the compatibility and interfacial adhesion between GTR and thermoplastic composites. For example, the impact strength of LLDPE/GTR (85/15) composite reached a 30% improvement by adding 0.3 wt.% MAPE above that of the same GTR content without MAPE treatment. A 52% improvement of impact strength for PP/GTR (50/50) by introducing 2 wt.% MAPE was obtained compared to the composite with the same content of untreated GTR
Dancing Between Success and Failure: Edit-level Simplification Evaluation using SALSA
Large language models (e.g., GPT-4) are uniquely capable of producing highly
rated text simplification, yet current human evaluation methods fail to provide
a clear understanding of systems' specific strengths and weaknesses. To address
this limitation, we introduce SALSA, an edit-based human annotation framework
that enables holistic and fine-grained text simplification evaluation. We
develop twenty one linguistically grounded edit types, covering the full
spectrum of success and failure across dimensions of conceptual, syntactic and
lexical simplicity. Using SALSA, we collect 19K edit annotations on 840
simplifications, revealing discrepancies in the distribution of simplification
strategies performed by fine-tuned models, prompted LLMs and humans, and find
GPT-3.5 performs more quality edits than humans, but still exhibits frequent
errors. Using our fine-grained annotations, we develop LENS-SALSA, a
reference-free automatic simplification metric, trained to predict sentence-
and word-level quality simultaneously. Additionally, we introduce word-level
quality estimation for simplification and report promising baseline results.
Our data, new metric, and annotation toolkit are available at
https://salsa-eval.com.Comment: Accepted to EMNLP 202
MultiTalk: A Highly-Branching Dialog Testbed for Diverse Conversations
We study conversational dialog in which there are many possible responses to
a given history. We present the MultiTalk Dataset, a corpus of over 320,000
sentences of written conversational dialog that balances a high branching
factor (10) with several conversation turns (6) through selective branch
continuation. We make multiple contributions to study dialog generation in the
highly branching setting. In order to evaluate a diverse set of generations, we
propose a simple scoring algorithm, based on bipartite graph matching, to
optimally incorporate a set of diverse references. We study multiple language
generation tasks at different levels of predictive conversation depth, using
textual attributes induced automatically from pretrained classifiers. Our
culminating task is a challenging theory of mind problem, a controllable
generation task which requires reasoning about the expected reaction of the
listener.Comment: 7 pages, AAAI-2
Automatic and Human-AI Interactive Text Generation
In this tutorial, we focus on text-to-text generation, a class of natural
language generation (NLG) tasks, that takes a piece of text as input and then
generates a revision that is improved according to some specific criteria
(e.g., readability or linguistic styles), while largely retaining the original
meaning and the length of the text. This includes many useful applications,
such as text simplification, paraphrase generation, style transfer, etc. In
contrast to text summarization and open-ended text completion (e.g., story),
the text-to-text generation tasks we discuss in this tutorial are more
constrained in terms of semantic consistency and targeted language styles. This
level of control makes these tasks ideal testbeds for studying the ability of
models to generate text that is both semantically adequate and stylistically
appropriate. Moreover, these tasks are interesting from a technical standpoint,
as they require complex combinations of lexical and syntactical
transformations, stylistic control, and adherence to factual knowledge, -- all
at once. With a special focus on text simplification and revision, this
tutorial aims to provide an overview of the state-of-the-art natural language
generation research from four major aspects -- Data, Models, Human-AI
Collaboration, and Evaluation -- and to discuss and showcase a few significant
and recent advances: (1) the use of non-retrogressive approaches; (2) the shift
from fine-tuning to prompting with large language models; (3) the development
of new learnable metric and fine-grained human evaluation framework; (4) a
growing body of studies and datasets on non-English languages; (5) the rise of
HCI+NLP+Accessibility interdisciplinary research to create real-world writing
assistant systems.Comment: To appear at ACL 2024, Tutoria
LENS: A Learnable Evaluation Metric for Text Simplification
Training learnable metrics using modern language models has recently emerged
as a promising method for the automatic evaluation of machine translation.
However, existing human evaluation datasets for text simplification have
limited annotations that are based on unitary or outdated models, making them
unsuitable for this approach. To address these issues, we introduce the
SimpEval corpus that contains: SimpEval_past, comprising 12K human ratings on
2.4K simplifications of 24 past systems, and SimpEval_2022, a challenging
simplification benchmark consisting of over 1K human ratings of 360
simplifications including GPT-3.5 generated text. Training on SimpEval, we
present LENS, a Learnable Evaluation Metric for Text Simplification. Extensive
empirical results show that LENS correlates much better with human judgment
than existing metrics, paving the way for future progress in the evaluation of
text simplification. We also introduce Rank and Rate, a human evaluation
framework that rates simplifications from several models in a list-wise manner
using an interactive interface, which ensures both consistency and accuracy in
the evaluation process and is used to create the SimpEval datasets.Comment: Accepted at ACL 202
- …